Grid-Based Colocation Mining Algorithms on GPU for Big Spatial Event Data: A Summary of Results

نویسندگان

  • Arpan Man Sainju
  • Zhe Jiang
چکیده

This paper investigates the colocation pattern mining problem for big spatial event data. Colocation patterns refer to subsets of spatial features whose instances are frequently located together. The problem is important in many applications such as analyzing relationships of crimes or disease with various environmental factors, but is computationally challenging due to a large number of instances, the potentially exponential number of candidate patterns, and high computational cost in generating pattern instances. Existing colocation mining algorithms (e.g., Apriori algorithm, multi-resolution filter, partial join and joinless approaches) are mostly sequential, and thus can be insufficient for big spatial event data. Recently, parallel colocation mining algorithms have been developed based on the Map-reduce framework. However, these algorithms need a large number of nodes to scale up, which is economically expensive, and their reducer nodes have a bottleneck of aggregating all instances of the same colocation patterns. Another work proposes a parallel colocation mining algorithm on GPU based on the iCPI tree and the joinless approach, but assumes that the number of neighbors for each instance is within a small constant, and thus may be inefficient when instances are dense and unevenly distributed. To address these limitations, we propose a grid-based GPU colocation mining algorithm that includes a novel cell aggregate based upper bound filter, and two refinement algorithms. We prove the correctness and completes of proposed GPU algorithms. Preliminary results on both real world data and synthetic data show that proposed GPU algorithms are promising with over 30 times speedup on up to millions of instances.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A comprehensive benchmark between two filter-based multiple-point simulation algorithms

Computer graphics offer various gadgets to enhance the reconstruction of high-order statistics that are not correctly addressed by the two-point statistics approaches. Almost all the newly developed multiple-point geostatistics (MPS) algorithms, to some extent, adapt these techniques to increase the simulation accuracy and efficiency. In this work, a scrutiny comparison between our recently dev...

متن کامل

Clustering Assisted Co-location Pattern Mining for Spatial Data

The importance of spatial data mining is growing with the increasing incidence and importance of large spatial datasets repositories of remote-sensing images, location based mobile app data, satellite imagery, medical data and crime data with location information, three dimensional maps, traffic data and many more. However, as classical data mining techniques are often inadequate for spatial da...

متن کامل

A Data Colocation Grid Framework for Big Data Medical Image Processing - Backend Design

When processing large medical imaging studies, adopting high performance grid computing resources rapidly becomes important. We recently presented a "medical image processing-as-a-service" grid framework that offers promise in utilizing the Apache Hadoop ecosystem and HBase for data colocation by moving computation close to medical image storage. However, the framework has not yet proven to be ...

متن کامل

Spatio-Temporal Big Data Analytics for Environmental Health

The framework for our proposed big data analytics platform is shown in Figure 1. Two complimentary systems support the wide variety of spatial analytics algorithms and techniques we are providing. On the left half of Figure 1, the more-traditional unix filesystem supports high-throughput computation (e.g., MPI [Snir et al., 1995], OpenMP [Dagum and Menon, 1998], GPGPU/CUDA Luebke et al. [2006])...

متن کامل

Implementation of the direction of arrival estimation algorithms by means of GPU-parallel processing in the Kuda environment (Research Article)

Direction-of-arrival (DOA) estimation of audio signals is critical in different areas, including electronic war, sonar, etc. The beamforming methods like Minimum Variance Distortionless Response (MVDR), Delay-and-Sum (DAS), and subspace-based Multiple Signal Classification (MUSIC) are the most known DOA estimation techniques. The mentioned methods have high computational complexity. Hence using...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017